Methods for reading data from a storage buffer including delaying activation of a column select

ABSTRACT

Disclosed are methods for reading data from a storage buffer. One such method may include retrieving a first set of data during a first period of time. The method may also include delaying data retrieval during a second period of time after the first period of time. The method may include outputting at least a portion of the first set of data during the first period of time and the second period of time. The first period of time is substantially similar to the second period of time.

BACKGROUND

1. Field of Invention

Embodiments of the invention relate generally to the field of memory devices. More specifically, embodiments of the present invention may provide one or more techniques for reading data from a storage buffer.

2. Description of Related Art

Computer systems are generally employed in numerous configurations to provide a variety of computing functions. Processing speeds, system flexibility, and size constraints are typically considered by design engineers tasked with developing computer systems and system components. Computer systems generally include a plurality of memory devices which may be used to store data (e.g., programs and user data) and which may be accessible to other system components such as processors or peripheral devices. Such memory devices may include volatile and non-volatile memory devices.

Typically, a memory device, such as a synchronous dynamic random access memory (SDRAM), includes a memory array divided into a plurality of memory banks, or other divisions. Based upon addressing information received by the memory device during operation, data may be stored into and read out of appropriate banks of the memory array. For example, during operation of SDRAM, an activate (e.g., active) command may be sent to the memory array. The activate command activates a row of the memory array. Further, a column select command may be sent to the memory array. The column select command selects a column of the memory array. With the row activated and the column selected, data may be retrieved from selected memory cells of the memory array.

In certain architectures, a memory device or a portion of a memory device may be used as a storage buffer. When data is read from the storage buffer, it may be beneficial for the data to be output seamlessly (e.g., without interruption). However, in some cases, a column select of the memory device retrieves more data than is desirable. Therefore, the storage buffer may use registers to temporarily hold data before the data is output, thereby seamlessly outputting data. As will be appreciated, there may be a large number of registers. For example, the storage buffer may use 1024 or 2048 registers. Such a large number of registers may adversely affect the cost of the storage buffer. Further, a die manufactured to hold the registers may take up a significant amount of space.

Accordingly, embodiments of the present invention may be directed to one or more of the problems set forth above.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 illustrates a block diagram of a processor-based device in accordance with embodiments of the present invention;

FIG. 2 is a partial schematic illustration of an integrated circuit, incorporating an array of memory cells fabricated in accordance with embodiments of the present invention;

FIG. 3 illustrates a partial functional block diagram of an architecture of a storage buffer in accordance with embodiments of the present invention;

FIG. 4 illustrates a timing diagram of data retrieval from a storage buffer in accordance with embodiments of the present invention;

FIG. 5 illustrates a flowchart of a method for reading data from a storage buffer in accordance with embodiments of the present invention; and

FIG. 6 illustrates a block diagram of a state machine engine that may include a storage buffer in accordance with embodiments of the present invention.

DETAILED DESCRIPTION

Some of the subsequently discussed embodiments may facilitate the manufacture of storage buffers with a limited number of registers, thereby conserving space and cost. As is described in detail below, the number of registers may be reduced by stalling column cycles during an array access. For example, a first set of data may be retrieved during a first period of time. The first set of data may be twice the amount of data that can be output during the first period of time. Therefore, following the first period of time, data retrieving may be delayed during a second period of time to allow the remaining portion of the first set of data to be output. As will be appreciated, in certain embodiments, the first period of time may be substantially similar to the second period of time. As such, the following discussion describes devices and methods in accordance with embodiments of the present technique.

Turning now to the drawings, and referring initially to FIG. 1, a block diagram depicting a processor-based system, generally designated by reference numeral 10, is illustrated. The system 10 may be any of a variety of types such as a computer, pager, cellular phone, personal organizer, control circuit, etc. In a typical processor-based device, one or more processors 12, such as a microprocessor, control the processing of system functions and requests in the system 10. As will be appreciated, the processor 12 may include an embedded North or South bridge (not shown), for coupling each of the aforementioned components thereto. Alternatively, the bridges may include separate bridges coupled between the processor 12 and the various components of the system 10.

The system 10 typically includes a power supply 14. For instance, if the system 10 is a portable system, the power supply 14 may advantageously include permanent batteries, replaceable batteries, and/or rechargeable batteries. The power supply 14 may also include an AC adapter, so the system 10 may be plugged into a wall outlet, for instance. In addition, the power supply 14 may include a DC adapter such that the system 10 may be plugged into a vehicle cigarette lighter, for instance. Various other devices may be coupled to the processor 12 depending on the functions that the system 10 performs. For instance, a user interface 16 may be coupled to the processor 12. The user interface 16 may include buttons, switches, a keyboard, a light pen, a mouse, and/or a voice recognition system, for instance. A display 18 may also be coupled to the processor 12. The display 18 may include an LCD display, a CRT, LEDs, and/or an audio display, for example. In some embodiments, the display 18 may be part of the user interface 16 (e.g., touch screen tablets). Furthermore, an RF sub-system/baseband processor 20 may also be coupled to the processor 12. The RF sub-system/baseband processor 20 may include an antenna that is coupled to an RF receiver and to an RF transmitter (not shown). One or more communication ports 22 may also be coupled to the processor 12. The communication port 22 may be adapted to be coupled to one or more peripheral devices 24 such as a modem, a printer, a computer, or to a network, such as a local area network, remote area network, intranet, or the Internet, for instance.

Because the processor 12 generally controls the functioning of the system 10 by implementing software programs, memory is operably coupled to the processor 12 to store and facilitate execution of various programs. For instance, the processor 12 may be coupled to the volatile memory 26 which may include Dynamic Random Access Memory (DRAM) and/or Static Random Access Memory (SRAM). The volatile memory 26 may include a number of memory modules, such as single inline memory modules (SIMMs) or dual inline memory modules (DIMMs). As can be appreciated, the volatile memory 26 may simply be referred to as the “system memory.” The volatile memory 26 is typically quite large so that it can store dynamically loaded data (e.g., applications).

The processor(s) 12 may also be coupled to non-volatile memory 28. The non-volatile memory 28 may include a read-only memory (ROM), such as an EPROM, and/or flash memory to be used in conjunction with the volatile memory. The size of the ROM is typically selected to be just large enough to store any necessary operating system, application programs, and fixed data. Additionally, the non-volatile memory 28 may include a high capacity memory such as a tape or disk drive memory. As will be appreciated, the volatile memory 26, or the non-volatile memory 28 may be considered a non-transitory tangible machine-readable medium for storing code (e.g., instructions).

One or more components of the system 10 may include a storage buffer that functions in accordance with embodiments described herein. Some examples of devices that may be part of a storage buffer are illustrated in FIGS. 2-3. Specifically, FIG. 2 illustrates an array of memory cells that may be part of the storage buffer, and FIG. 3 illustrates a functional block diagram that may be associated with the architecture of the storage buffer. FIGS. 4-5 describe the timing of signals of the storage buffer and methods of operating the storage buffer.

Referring now to FIG. 2, a partial schematic illustration of an integrated circuit, such as a memory device 29, which may be implemented in the volatile memory 26, is illustrated. The memory device 29 includes an array of memory cells which may be part of a storage buffer operating in accordance with the techniques described herein. In some embodiments, the memory device 29 may comprise a dynamic random access memory (DRAM) device. The memory device 29 includes a number of memory cells 30 arranged in a grid pattern and comprising a number of rows and columns. The number of memory cells 30 (and corresponding rows and columns) may vary depending on system requirements and fabrication technology. Each memory cell 30 may include an access device (e.g., a MOSFET 32), and a storage device (e.g., a capacitor 34). In certain embodiments, the memory cell 30 may not include an access device (e.g., some cross-point memories). In other embodiments, the memory cell 30 may include an access device that is part of its storage device (e.g., 1T0C devices, such as floating body devices). The MOSFET 32 includes a drain terminal 36, a source terminal 38, and a gate 40. The capacitor 34 is coupled to the source terminal 38. The terminal of the capacitor 34 that is not coupled to the MOSFET 32 may be coupled to a ground plane. As described further below, the drain 36 is coupled to a bit line (BL) and the gate 40 is coupled to a word line (WL).

It should be noted that although the above description depicts the terminal of the access device coupled to the capacitor 34 as the “source” 38 and the other non-gate terminal of the access device as the “drain” 36, during read and write operations, the MOSFET 32 may be operated such that each of the terminals 36 and 38 operates at one time or another as a source or a drain. Accordingly, for purposes of further discussion it should be recognized that whenever a terminal is identified as a “source” or a “drain,” it is only for convenience and that in fact during operation of the MOSFET 32 either terminal could be a source or a drain depending on the manner in which the MOSFET 32 is being controlled by the voltages applied to the terminals 36, 38 and 40. In addition, it will be appreciated that embodiments of a memory device 29 may include p-type MOSFETs, n-type MOSFETs, or a combination of both.

As previously described, the memory array is arranged in a series of rows and columns. To implement the data storage capabilities of a memory cell 30, an electrical charge is placed on the drain 36 of the MOSFET 32 via a bit line (BL). By controlling the voltage at the gate 40 via the word line (WL), the depletion region between the gate 40 and the channel may be narrowed such that the electrical charge at the drain 36 can flow to the capacitor 34. By storing electrical charge in the capacitor 34, the charge may be interpreted as a binary data value in the memory cell 30. For instance, for a single-bit storage device, a positive charge above a known threshold voltage stored in the capacitor 34 may be interpreted as binary “1.” If the charge in the capacitor 34 is below the threshold value, a binary value of “0” is said to be stored in the memory cell 30.

The bit lines BL are used to read and write data to and from the memory cells 30. The word lines WL are used to activate the MOSFET 32 to access a particular row of memory cells 30. Accordingly, the memory device 29 also includes a periphery portion which may include an address buffer 42, row decoder 44 and column decoder 46. The row decoder 44 and column decoder 46 selectively access the memory cells 30 in response to address signals that are provided on the address bus 48 during read, write, and refresh operations. The address signals are typically provided by an external controller such as a microprocessor, or another type of memory controller, but in some embodiment the address signals may be internally generated. The column decoder 46 may also include sense amplifiers and input/output circuitry to further facilitate the transfer of data to and from the memory cells 30 via the bit lines BL.

In one mode of operation, the memory device 29 receives the address of a particular memory cell(s) 30 at the address buffer 42. The address buffer 42 passes the address to the row decoder 44. The row decoder 44 selectively activates the particular word line WL identified by the address to activate the MOSFET's 32 of each memory cell 30 that is connected to the selected word line WL. The column decoder 46 selects the bit line (or bit lines) BL of the memory cell(s) 30 corresponding to the address. For a write operation, data received by the input/output circuitry is coupled to the selected bit line (or bit lines) BL and provides for the charge or discharge of the capacitor 34 of the selected memory cell(s) 30 through the activated MOSFET 32. The charge typically corresponds to binary data, as previously described. For a read operation, data stored in the selected memory cell(s) 30, represented by the charge stored in the capacitor(s) 34, is coupled to the select bit line (or bit lines) BL, amplified by the sense amplifier and a corresponding voltage level is provided to the input/output circuitry in the column decoder 46.

As described below, a memory device 29 may be part of a storage buffer operating in accordance with the techniques described herein and may have a smaller die size than other storage buffers. For example, memory device 29 may be part of a storage buffer that includes a limited number of registers. Furthermore, because the storage buffer includes a limited number of registers, the cost to manufacture the storage buffer may be reduced.

Referring now to FIG. 3, a partial functional block diagram of an architecture of a storage buffer 50 is illustrated. The storage buffer 50 includes multiple memory banks 52, 54, 56, and 58. In certain embodiments, the storage buffer 50 may include sixteen memory banks (e.g., in an ×16 configuration). As will be appreciated, each of the memory banks 52, 54, 56, and 58 includes a memory array having a plurality of memory cells 30. Furthermore, each of the memory banks 52, 54, 56, and 58 may be coupled to sense amplifiers for amplifying data read from the memory banks. It should be noted that in certain embodiments, the storage buffer 50 may be only a portion of another memory device. A row address 60 is used to select a row (e.g., activate a word line) of the storage buffer 50, thereby activating a certain number of memory cells 30 for performing a read and/or write operation. For example, activating a word line may activate 2048 memory cells 30 for performing a read operation. Further, a column address 62 is used to select a column (e.g., one or more bit lines) of the storage buffer 50 for writing data to and/or reading data from the memory cells 30. It should be noted that selecting one or more bit lines of the storage buffer 50 may select multiple memory cells 30 for reading and/or writing concurrently. For example, selecting one or more bit lines may select 256 memory cells 30 to be concurrently read from. In certain embodiments, the row address 60 or column address 62 may be used to select one of the memory banks 52, 54, 56, and 58 to be accessed. In other embodiments, a separate bank address may be used to select one of the memory banks 52, 54, 56, and 58 to be accessed.

During a memory read, data may be transferred from the memory banks 52, 54, 56, and 58 to registers 64. The registers 64 may include any number of data storage locations (e.g., latches, etc.) for temporarily storing data. For example, the registers 64 may include approximately 384 data storage locations. Data may be transferred from the registers 64 via a data bus 66 to data output circuitry 68 which conditions the data for being output from the storage buffer 50. In certain embodiments, the data bus 66 may be a 128 bit data bus for concurrently transferring 128 bits from the registers 64. The data output circuitry 68 provides a data output 70 via data nodes (e.g., pins) 72 (e.g., DQ(15:0)). Therefore, the data output circuitry 68 is limited to outputting data based on the number of output data nodes 72. For example, in a storage buffer 50 with 16 data nodes, 16 bits of data may be output at a time.

Using the storage buffer 50 as described above, data may be output from the storage buffer 50 seamlessly. For example, when a read request is made to the storage buffer 50, data may be transferred out via the output data nodes 72 without interruptions. Further, the storage buffer 50 may be designed to include a limited number of registers to reduce manufacturing cost and to obtain a limited die size. In certain embodiments, the storage buffer 50 may be designed to completely eliminate the registers 64.

Turning to FIG. 4, a timing diagram 80 of data retrieval from the storage buffer 50 is illustrated. The timing diagram 80 includes timing relating to a clock 82, an activate command 84, a read command 86, a row address 88, a wordline_0 90, a column select 92, a column address 94, a register data out 96, and a buffer data out 98. The clock 82 provides a timing signal to synchronize the operations of the storage buffer 50. As illustrated, the clock 82 consistently provides an alternating signal (e.g., logic low, logic high, logic low, logic high, etc.) during operation of the storage buffer 50. The clock 82 may operate at any suitable frequency. For example, the clock 82 may operate at 500 MHz, 750 MHz, 800 MHz, 1.000 GHz, 1.150 GHz, 1.500 GHz, and so forth.

The activate command 84 is used to activate (e.g., open) a row of memory cells 30 in the storage buffer 50. In certain embodiments, the activate command 84 may activate the row of memory cells 30 within a selected bank of the storage buffer 50. A pulse 100 illustrates the activate command 84 being applied to the storage buffer 50. At the time the pulse 100 is applied, the row address 88 is set to “0.” Therefore, wordline_0 90 is activated and transitions at a time 102 to a logic high 104. Accordingly, wordline_0 90 is activated from the time 102 through the remaining time shown in the timing diagram 80. As such, the wordline_0 90 is activated for performing read and/or write operations. As will be appreciated, the wordline_0 90 may activate a specific number of memory cells 30 that corresponds to the particular architecture of the storage buffer 50. For example, the wordline_0 90 may activate 2048 memory cells 30.

The read command 86 is used to retrieve data from a selected column of memory cells 30 in the storage buffer 50. A pulse 106 illustrates the read command 86 being applied to the storage buffer 50. After the pulse 106 is issued, the column select 92 transitions to a logic high as illustrated by a pulse 108. While the pulse 108 is applied, the column address 94 is set to “0” (e.g., segment 110). Therefore, data is transferred from row “0,” column “0” of the storage buffer 50 into the registers 64. In certain embodiments, with a single column select 92, 256 bits of data are transferred into the registers 64. Data is then transferred out of the registers 64 to the data output circuitry 68 responsive to a pulse 112 of the register data out 96. As will be appreciated, data may be transferred out of the registers 64 at a rate that is less than (e.g., half) the rate that data is transferred into the registers 64. For example, data may be transferred out of the registers 64 (e.g., data output rate) in sets of 128 bits over a period of time of approximately 3.5 ns (e.g., 4.57 GB/s), while data may be transferred into the registers 64 (e.g., retrieval rate) in sets of 256 bits over a period of time of approximately 3.5 ns (e.g., 9.14 GB/s). The data may be transferred out of the registers 64 to the data output circuitry 68 using the data bus 66 (e.g., 128 bit data bus). A series of pulses 114 illustrate data being output from the data output circuitry 68 onto data nodes (e.g., output data nodes 72). As discussed above, data may be output 16 bits at a time. For example, the series of pulses 114 includes four pulses. Data may be output onto the 16 output data nodes 72 with each rising edge (i.e., four times) and each falling edge (i.e., four times) of the series of pulses 114. Thus, 128 bits of data may be output during the series of pulses 114 (i.e., 16×8=128). Accordingly, during the series of pulses 114, the data from the data output circuitry 68 will be output from the storage buffer 50 onto the output data nodes 72.

A pulse 116 illustrates the read command 86 again being applied to the storage buffer 50. After the pulse 116 is issued, the column select 92 transitions to a logic high as illustrated by a pulse 118. It should be noted that, in certain embodiments, the time from the rising edge of the pulse 108 to the rising edge of the pulse 118 (e.g., array cycle time 119) may be approximately 3.5 ns. Likewise, the time from the falling edge of the pulse 108 to the falling edge of the pulse 118 may be approximately 3.5 ns. In other embodiments, the array cycle time 119 may be 2 ns, 4 ns, 5 ns, 10 ns, and so forth. While the pulse 118 is applied, the column address 94 is set to “1” (e.g., segment 120). Therefore, data is transferred from row “0,” column “1” of the storage buffer 50 into the registers 64.

As will be appreciated, just prior to the pulse 118, the registers 64 may contain approximately half of the data transferred from row “0,” column “0” of the storage buffer 50. For example, if 256 bits of data were transferred from row “0,” column “0” into the registers 64, only 128 bits of data may remain in the registers 64 due to 128 bits of data being transferred from the registers 64 to the data output circuitry 68 responsive to pulse 112. Therefore, the registers 64 may have 256 available temporary storage locations to store the data transferred from row “0,” column “1.” Accordingly, the registers 64 may include one and a half times the amount of storage locations that are used for transferring data from a single column of a row (e.g., 1.5×256=384). Data is transferred out of the registers 64 to the data output circuitry 68 responsive to a pulse 122 of the register data out 96. A series of pulses 124 illustrate data again being output from the data output circuitry 68 onto data nodes (e.g., output data nodes 72). Accordingly, during the series of pulses 124, the data from the data output circuitry 68 will be output from the storage buffer 50 onto the output data nodes 72.

Furthermore, a pulse 126 illustrates the read command 86 again being applied to the storage buffer 50. After the pulse 126 is issued, the column select 92 does not transition, as illustrated by a segment 128. In certain embodiments, the column select 92 may not transition for a total of two array cycle times beyond an array cycle time 129 that includes pulse 118 (e.g., the array cycle time 129 may be approximately twice the time of the pulse 118). For example, during segment 128, the column select 92 may not transition for 7.0 ns after the array cycle time 129, or 8.75 ns after the pulse 118. In other embodiments, the column select 92 may not transition for 4 ns, 8 ns, 10 ns, 20 ns, and so forth, after the array cycle time 129. With the column select 92 not transitioning during the segment 128, no data is transferred from the storage buffer 50 into the registers 64. Thus, during the segment 128, the registers 64 are able to transfer out any remaining data stored thereon. Accordingly, data is transferred out of the registers 64 to the data output circuitry 68 responsive to a pulse 130 of the register data out 96. A series of pulses 132 illustrate data again being output from the data output circuitry 68 onto data nodes (e.g., output data nodes 72). Therefore, during the series of pulses 132, the data from the data output circuitry 68 will be output from the storage buffer 50 onto the output data nodes 72.

Likewise, a pulse 134 illustrates the read command 86 again being applied to the storage buffer 50. After the pulse 134 is issued, the column select 92 again does not transition, as illustrated by the segment 128. Accordingly, data is transferred out of the registers 64 to the data output circuitry 68 responsive to a pulse 136 of the register data out 96. A series of pulses 138 illustrate data again being output from the data output circuitry 68 onto data nodes (e.g., output data nodes 72). Therefore, during the series of pulses 138, the data from the data output circuitry 68 will be output from the storage buffer 50 onto the output data nodes 72.

A pulse 140 illustrates the read command 86 again being applied to the storage buffer 50. This read command 86 initiates a repetition of the cycle that started with the pulse 106. After the pulse 140 is issued, the column select 92 transitions to a logic high as illustrated by a pulse 142. While the pulse 142 is applied, the column address 94 is set to “2” (e.g., segment 144). Therefore, data is transferred from row “0,” column “2” of the storage buffer 50 into the registers 64. Data is then transferred out of the registers 64 to the data output circuitry 68 responsive to a pulse 146 of the register data out 96. A series of pulses 148 illustrate data being output from the data output circuitry 68 onto data nodes (e.g., output data nodes 72). Accordingly, during the series of pulses 148, the data from the data output circuitry 68 will be output from the storage buffer 50 onto the output data nodes 72.

Furthermore, a pulse 150 illustrates the read command 86 again being applied to the storage buffer 50. After the pulse 150 is issued, the column select 92 transitions to a logic high as illustrated by a pulse 152. While the pulse 152 is applied, the column address 94 is set to “3” (e.g., segment 154). Therefore, data is transferred from row “0,” column “3” of the storage buffer 50 into the registers 64. Data is then transferred out of the registers 64 to the data output circuitry 68 responsive to a pulse 156 of the register data out 96. A series of pulses 158 illustrate data again being output from the data output circuitry 68 onto data nodes (e.g., output data nodes 72). Accordingly, during the series of pulses 158, the data from the data output circuitry 68 will be output from the storage buffer 50 onto the output data nodes 72.

A pulse 160 illustrates the read command 86 again being applied to the storage buffer 50. After the pulse 160 is issued, the column select 92 does not transition, as illustrated by a segment 162. In certain embodiments, the column select 92 may not transition for a total of two array cycle times beyond an array cycle time (e.g., approximately 3.5 ns) that includes pulse 152. For example, during segment 162, the column select 92 may not transition for 7.0 ns after the array cycle time that includes pulse 152, or 8.75 ns after the pulse 152. With the column select 92 not transitioning during the segment 162, no data is transferred from the storage buffer 50 into the registers 64. Thus, during the segment 162, the registers 64 are able to transfer out any remaining data. Accordingly, data is transferred out of the registers 64 to the data output circuitry 68 responsive to a pulse 164 of the register data out 96. A series of pulses 166 illustrate data again being output from the data output circuitry 68 onto data nodes (e.g., output data nodes 72). Therefore, during the series of pulses 166, the data from the data output circuitry 68 will be output from the storage buffer 50 onto the output data nodes 72.

Likewise, a pulse 168 illustrates the read command 86 again being applied to the storage buffer 50. After the pulse 168 is issued, the column select 92 again does not transition, as illustrated by the segment 162. Accordingly, data is transferred out of the registers 64 to the data output circuitry 68 responsive to a pulse 170 of the register data out 96. A series of pulses 172 illustrate data again being output from the data output circuitry 68 onto data nodes (e.g., output data nodes 72). Therefore, during the series of pulses 172, the data from the data output circuitry 68 will be output from the storage buffer 50 onto the output data nodes 72.

As described above, data may be read from the storage buffer 50 by activating a first column select followed by activating a second column select (e.g., this may take a total of approximately 7.0 ns). Because data is retrieved from the columns at twice the rate that data is transferred out of the storage buffer 50, no column select is applied for two array cycle times (e.g., approximately 7.0 ns). This pattern is then repeated until all data is read out of a particular word line of the storage buffer 50. For example, the timing diagram 80 as illustrated may read out approximately 1024 bits of data over a time period of approximately 28 ns. In certain embodiments, the timing diagram 80 may represent timing for outputting only half of the data from wordline_0 90. In such an embodiment, the timing diagram 80 may be repeated for more columns (e.g., data may be transferred from columns 4 through 7 in a similar manner as described above). As such, the storage buffer 50 may output approximately 2048 bits of data over a time period of approximately 56 ns. Further, the same method may be repeated with each word line. Using such a method, data may be seamlessly output from the storage buffer 50. Further, die space for the storage buffer 50 may be minimized enabling the storage buffer 50 to be manufactured at a lower cost than other storage buffers.

As will be appreciated, in certain embodiments (e.g., an ×4 configuration), data may be read from the storage buffer 50 at four times the rate that data is transferred out of the storage buffer 50. Accordingly, data may be read from the storage buffer 50 by activating a first column select followed by activating a second column select. To transfer all of the data read out of the storage buffer 50 during the first and second column selects, no column select may be applied for four array cycle times. Furthermore, the same pattern of reading data and delaying column selects may be applied to any type of storage buffer 50 configuration.

Referring now to FIG. 5, a flowchart of a method 180 for retrieving (e.g., fetching) data from a storage buffer 50 is illustrated. At block 182, a first column select is activated during a first period of time to retrieve a first set of data. As will be appreciated, the first period of time may include the time it takes for a full array cycle, including activating and deactivating the first column select. For example, the first period of time may be approximately 3.5 ns (e.g., 1.75 ns activated and 1.75 ns deactivated). Further, the first set of data may be data including any number of bits. For example, the first set of data may include 256 bits.

Next, at block 184, a second column select is activated during a second period of time to retrieve a second set of data. Again, as will be appreciated, the second period of time may include the time it takes for a full array cycle, including activating and deactivating the second column select. For example, the second period of time may be approximately 3.5 ns (e.g., 1.75 ns activated and 1.75 ns deactivated). Further, the second set of data may be data including any number of bits. For example, the second set of data may include 256 bits.

Then, at block 186, activation of a column select is delayed during a third period of time to inhibit additional data retrieval (e.g., to allow the first and second sets of data to be output from the storage buffer 50). It should be noted that the third period of time may be substantially the same as a sum of the first period of time and the second period of time. For example, if the first period of time is approximately 3.5 ns and the second period of time is approximately 3.5 ns, the third period of time may be approximately 7.0 ns.

At block 188, a third column select is activated during a fourth period of time to retrieve a third set of data. Next, at block 190, a fourth column select is activated during a fifth period of time to retrieve a fourth set of data. Then, at block 192, activation of a column select is delayed during a sixth period of time to inhibit additional data retrieval (e.g., to allow the third and fourth sets of data to be output from the storage buffer 50).

Furthermore, at block 194, a fifth column select is activated during a seventh period of time to retrieve a fifth set of data. Next, at block 196, a sixth column select is activated during an eighth period of time to retrieve a sixth set of data. Then, at block 198, activation of a column select is delayed during a ninth period of time to inhibit additional data retrieval (e.g., to allow the fifth and sixth sets of data to be output from the storage buffer 50).

In addition, at block 200, a seventh column select is activated during a tenth period of time to retrieve a seventh set of data. Next, at block 202, an eighth column select is activated during an eleventh period of time to retrieve an eighth set of data. Then, at block 204, activation of a column select is delayed during a twelfth period of time to inhibit additional data retrieval (e.g., to allow the seventh and eighth sets of data to be output from the storage buffer 50).

As will be appreciated, a sum of the twelve periods of time may be a total time that it takes to retrieve all of the data from a word line (e.g., wordline_0 90). In certain embodiments, the sum of the twelve periods of time may be approximately 56 ns. In other embodiments, the sum of the twelve period of time may be less than or greater than 56 ns. When data is retrieved from the word line, the storage buffer 50 may be precharged during a period of time (e.g., to be ready to retrieve data from another word line). Further, some embodiments, may only include blocks 182 through 186, while other embodiments may only include blocks 182 through 192. It should be noted that the method 180 may include fewer or more blocks than illustrated.

Using the method 180 described above, data may be retrieved from the storage buffer 50 at a retrieval rate that is greater than the output rate of the storage buffer 50. As such, the retrieval of data from the storage buffer 50 may be delayed to allow the output of the storage buffer 50 time to output the data. As will be appreciated, the ideas presented above may be applied to any mismatch between the retrieval rate and the output rate of the storage buffer 50 in order to provide for seamless data output from the storage buffer 50. Further, the storage buffer 50 provides the seamless data output with few registers which allows the storage buffer 50 to be manufactured at a lower cost than other storage buffers.

The storage buffer 50 as described in the present application may be used in a variety of different applications. For example, the storage buffer 50 may be used in a state machine engine 206 illustrated in FIG. 6, which may operate under control of the processor 12 of FIG. 1. The state machine engine 206 may employ any one of a number of state machine architectures, including, but not limited to Mealy architectures, Moore architectures, Finite State Machines (FSMs), Deterministic FSMs (DFSMs), Bit-Parallel State Machines (BPSMs), etc. Though a variety of architectures may be used, for discussion purposes, this application refers to FSMs. However, those skilled in the art will appreciate that the described techniques may be employed using any one of a variety of state machine architectures.

As discussed further below, the state machine engine 206 may include a number of (e.g., one or more) finite state machine (FSM) lattices 208. Each FSM lattice 208 may include multiple FSMs that each receive and analyze the same data in parallel. Further, the FSM lattices 208 may be arranged in groups (e.g., clusters), such that clusters of FSM lattices 208 may analyze the same input data in parallel. Further, clusters of FSM lattices 208 of the state machine engine 206 may be arranged in a hierarchical structure wherein outputs from state machine lattices 208 on a lower level of the hierarchical structure may be used as inputs to state machine lattices 208 on a higher level. By cascading clusters of parallel FSM lattices 208 of the state machine engine 206 in series through the hierarchical structure, increasingly complex patterns may be analyzed (e.g., evaluated, searched, etc.).

Further, based on the hierarchical parallel configuration of the state machine engine 206, the state machine engine 206 can be employed for pattern recognition in systems that utilize high processing speeds. For instance, embodiments described herein may be incorporated in systems with processing speeds of 1 GByte/sec. Accordingly, utilizing the state machine engine 206, data from high speed memory devices or other external devices may be rapidly analyzed for various patterns. The state machine engine 206 may analyze a data stream according to several criteria, and their respective search terms, at about the same time, e.g., during a single device cycle. Each of the FSM lattices 208 within a cluster of FSMs on a level of the state machine engine 206 may each receive the same search term from the data stream at about the same time, and each of the parallel FSM lattices 208 may determine whether the term advances the state machine engine 206 to the next state in the processing criterion. The state machine engine 206 may analyze terms according to a relatively large number of criteria, e.g., more than 100, more than 110, or more than 10,000. Because they operate in parallel, they may apply the criteria to a data stream having a relatively high bandwidth, e.g., a data stream of greater than or generally equal to 1 GByte/sec, without slowing the data stream.

In one embodiment, the state machine engine 206 may be configured to recognize (e.g., detect) a great number of patterns in a data stream. For instance, the state machine engine 206 may be utilized to detect a pattern in one or more of a variety of types of data streams that a user or other entity might wish to analyze. For example, the state machine engine 206 may be configured to analyze a stream of data received over a network, such as packets received over the Internet or voice or data received over a cellular network. In one example, the state machine engine 206 may be configured to analyze a data stream for spam or malware. The data stream may be received as a serial data stream, in which the data is received in an order that has meaning, such as in a temporally, lexically, or semantically significant order. Alternatively, the data stream may be received in parallel or out of order and, then, converted into a serial data stream, e.g., by reordering packets received over the Internet. In some embodiments, the data stream may present terms serially, but the bits expressing each of the terms may be received in parallel. The data stream may be received from a source external to the system 10, or may be formed by interrogating a memory device, such as the volatile memory 26 or non-volatile memory 28, and forming the data stream from data stored in the memory 26, 28. In other examples, the state machine engine 206 may be configured to recognize a sequence of characters that spell a certain word, a sequence of genetic base pairs that specify a gene, a sequence of bits in a picture or video file that form a portion of an image, a sequence of bits in an executable file that form a part of a program, or a sequence of bits in an audio file that form a part of a song or a spoken phrase. The stream of data to be analyzed may include multiple bits of data in a binary format or other formats, e.g., base ten, ASCII, etc. The stream may encode the data with a single digit or multiple digits, e.g., several binary digits.

In an example, the FSM lattice 208 comprises an array of blocks. Each block may include a plurality of selectively couple-able hardware elements (e.g., programmable elements and/or special purpose elements) that correspond to a plurality of states in a FSM. Similar to a state in a FSM, a hardware element can analyze an input stream and activate a downstream hardware element, based on the input stream.

The programmable elements can be programmed to implement many different functions. For instance, the programmable elements may include state machine elements (SMEs) that are hierarchically organized into rows and blocks. To route signals between the hierarchically organized SMEs, a hierarchy of programmable switching elements can be used, including inter-block switching elements, intra-block switching elements, and intra-row switching elements. The switching elements may include routing structures and buffers. An SME can correspond to a state of a FSM implemented by the FSM lattice 208. Accordingly, a FSM can be implemented on the FSM lattice 208 by programming the SMEs to correspond to the functions of states and by selectively coupling together the SMEs to correspond to the transitions between states in the FSM.

As previously described, the state machine engine 206 is configured to receive data from a source, such as the volatile memory 26 and/or the non-volatile 28, over a data bus. In the illustrated embodiment, data may be sent to the state machine engine 206 through a bus interface, such as a DDR3 bus interface 210. The DDR3 bus interface 210 may be capable of exchanging data at a rate greater than or equal to 1 GByte/sec. As will be appreciated, depending on the source of the data to be analyzed, the bus interface 210 may be any suitable bus interface for exchanging data to and from a data source to the state machine engine 206, such as a NAND Flash interface, PCI interface, etc. As previously described, the state machine engine 206 includes one or more FSM lattices 208 configured to analyze data. Each FSM lattice 208 may be divided into two half-lattices. In the illustrated embodiment, each half lattice may include 24K SMEs, such that the lattice 208 includes 48K SMEs. The lattice 208 may comprise any desirable number of SMEs. Further, while only one FSM lattice 208 is illustrated, the state machine engine 206 may include multiple FSM lattices 208, as previously described.

Data to be analyzed may be received at the bus interface 210 and transmitted to the FSM lattice 208 through a number of buffers and buffer interfaces. In the illustrated embodiment, the data path includes data buffers 212, process buffers 214 and an inter-rank (IR) bus and process buffer interface 216. The data buffers 212 are configured to receive and temporarily store data to be analyzed. In one embodiment, there are two data buffers 212 (data buffer A and data buffer B). Data may be stored in one of the two data buffers 212, while data is being emptied from the other data buffer 212, for analysis by the FSM lattice 208. In the illustrated embodiment, the data buffers 212 may be 32 KBytes each. The IR bus and process buffer interface 216 may facilitate the transfer of data to the process buffer 214. The IR bus and process buffer 216 ensures that data is processed by the FSM lattice 208 in order. The IR bus and process buffer 216 may coordinate the exchange of data, timing information, packing instructions, etc. such that data is received and analyzed in the correct order. Generally, the IR bus and process buffer 216 allows the analyzing of multiple data sets in parallel through logical ranks of FSM lattices 208.

In the illustrated embodiment, the state machine engine 206 also includes a de-compressor 218 and a compressor 220 to aid in the transfer of the large amounts of data through the state machine engine 206. The compressor 220 and de-compressor 218 work in conjunction such that data can be compressed to minimize the data transfer times. By compressing the data to be analyzed, the bus utilization time may be minimized. In certain embodiments, a mask may be provided to the state machine engine 206 to provide information on which state machines are likely to be unused. The compressor 220 and de-compressor 218 can also be configured to handle data of varying burst lengths. By padding compressed data and including an indicator as to when each compressed region ends, the compressor 220 may improve the overall processing speed through the state machine engine 206. The compressor 220 and de-compressor 218 may also be used to compress and decompress match results data after analysis by the FSM lattice 208.

As previously described, the output of the FSM lattice 208 can comprise a state vector. The state vector comprises the state (e.g., activated or not activated) of programmable elements of the FSM lattice 208. Each state vector may be temporarily stored in the state vector cache memory 222 for further hierarchical processing and analysis. That is, the state of each state machine may be stored, such that the final state may be used in further analysis, while freeing the state machines for reprogramming and/or further analysis of a new data set. Like a typical cache, the state vector cache memory 222 allows storage of information, here state vectors, for quick retrieval and use, here by the FSM lattice 208, for instance. Additional buffers, such as the state vector memory buffer 224, state vector intermediate input buffer 226, and state vector intermediate output buffer 228, may be utilized in conjunction with the state vector cache memory 222 to accommodate rapid analysis and storage of state vectors, while adhering to packet transmission protocol through the state machine engine 206.

Once a result of interest is produced by the FSM lattice 208, match results may be stored in a match results memory 230. That is, a “match vector” indicating a match (e.g., detection of a pattern of interest) may be stored in the match results memory 230. The match result can then be sent to a match buffer 232 for transmission over the bus interface 210 to the processor 12, for example. As previously described, the match results may be compressed.

Additional registers and buffers may be provided in the state machine engine 206, as well. For instance, the state machine engine 206 may include control and status registers 234. In addition, restore and program buffers 236 may be provided for using in programming the FSM lattice 208 initially, or restoring the state of the machines in the FSM lattice 208 during analysis. Similarly, save and repair map buffers 238 may also be provided for storage of save and repair maps for setup and usage.

As described, the state machine engine 206 includes many different buffers. As will be appreciated, any of the buffers described herein may include the features of the storage buffer 50 described above. For example, any of the following may include features of the storage buffer 50: the data buffers 212, the process buffers 214, the state vector memory buffer 224, the state vector intermediate input buffer 226, the state vector intermediate output buffer 228, the match buffers 232, the restore and program buffers 236, the save and repair map buffers 238, and so forth.

While the invention may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and have been described in detail herein. However, it should be understood that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the following appended claims. 

What is claimed is:
 1. A method for reading data from a storage buffer comprising: activating a first column select during a first period of time to retrieve a first set of data; activating a second column select during a second period of time, after the first period of time, to retrieve a second set of data; and delaying activation of a column select during a third period of time, after the second period of time, to inhibit additional data retrieval; wherein the third period of time is substantially the same as a sum of the first period of time and the second period of time.
 2. The method of claim 1, wherein the first period of time is less than 5 ns, the second period of time is less than 5 ns, or some combination thereof.
 3. The method of claim 1, wherein the first set of data comprises 256 bits, the second set of data comprises 256 bits, or some combination thereof.
 4. The method of claim 1, wherein the third period of time is less than 10 ns.
 5. The method of claim 1, wherein the first period of time is approximately 3.5 ns and the second period of time is approximately 3.5 ns.
 6. The method of claim 1, wherein the third period of time is approximately 7 ns.
 7. The method of claim 1, comprising: activating a third column select during a fourth period of time, after the third period of time, to retrieve a third set of data; activating a fourth column select during a fifth period of time, after the fourth period of time, to retrieve a fourth set of data; and delaying activation of a column select during a sixth period of time, after the fifth period of time, to inhibit additional data retrieval; wherein the sixth period of time is substantially the same as a sum of the fourth period of time and the fifth period of time.
 8. The method of claim 7, wherein the first, second, fourth, and fifth periods of time are each approximately 3.5 ns and the third and sixth periods of time are each approximately 7.0 ns.
 9. A method for reading data from a storage buffer comprising: retrieving a first set of data during a first period of time; delaying data retrieval during a second period of time after the first period of time; and outputting at least a portion of the first set of data during the first period of time and the second period of time; wherein the first period of time is substantially similar to the second period of time.
 10. The method of claim 9, wherein the first period of time and the second period of time are each approximately 7.0 ns.
 11. The method of claim 9, wherein retrieving the first set of data during the first period of time comprises retrieving approximately 512 bits of data.
 12. The method of claim 9, wherein the first set of data is retrieved at a retrieval rate greater than the data output rate of the storage buffer.
 13. The method of claim 9, comprising: retrieving a second set of data during a third period of time after the second period of time; delaying data retrieval during a fourth period of time after the third period of time; outputting at least a portion of the second set of data during the third period of time and the fourth period of time; retrieving a third set of data during a fifth period of time after the fourth period of time; delaying data retrieval during a sixth period of time after the fifth period of time; outputting at least a portion of the third set of data during the fifth period of time and the sixth period of time; retrieving a fourth set of data during a seventh period of time after the sixth period of time; delaying data retrieval during an eighth period of time after the seventh period of time; outputting at least a portion of the fourth set of data during the seventh period of time and the eighth period of time; wherein the third, fourth, fifth, sixth, seventh, and eighth periods of time all substantially similar to each other.
 14. The method of claim 13, wherein the first, second, third, fourth, fifth, sixth, seventh, and eighth periods of time are each approximately 7.0 ns.
 15. The method of claim 13, wherein retrieving the first, second, third, and fourth sets of data comprises retrieving a sum of approximately 2048 bits of data.
 16. The method of claim 13, comprising precharging the storage buffer during a ninth period of time after the eighth period of time.
 17. A method for reading data from a storage buffer comprising: retrieving a first set of data at a retrieval rate greater than a data output rate of the storage buffer during a first period of time; and delaying data retrieval during a second period of time; wherein the sum of the first period of time and the second period of time is substantially the same as a third period of time and the third period of time is an amount of time it takes for the storage buffer to output the first set of data based on the data output rate.
 18. The method of claim 17, wherein the retrieval rate is approximately double the data output rate.
 19. The method of claim 17, wherein the data output rate is approximately 128 bits per 3.5 ns.
 20. A non-transitory tangible machine-readable medium having code stored thereon, the code comprising instructions for: activating a first column select during a first period of time to retrieve a first set of data; activating a second column select during a second period of time, after the first period of time, to retrieve a second set of data; delaying activation of a column select during a third period of time, after the second period of time, to inhibit additional data retrieval; and outputting the first set of data over a fourth period of time that is substantially the same as a sum of the first period of time and the second period of time.
 21. The non-transitory tangible machine-readable medium of claim 20, wherein the third period of time is substantially the same as the fourth period of time.
 22. The non-transitory tangible machine-readable medium of claim 20, wherein the code comprises instructions for clocking out data during the first period of time, the second period of time, and the third period of time.
 23. A device, comprising: a state machine comprising a storage buffer configured to retrieve a first set of data at a retrieval rate greater than a data output rate of the storage buffer during a first period of time and delay data retrieval during a second period of time, wherein the sum of the first period of time and the second period of time is substantially the same as a third period of time and the third period of time is an amount of time it takes for the storage buffer to output the first set of data to the state machine based on the data output rate.
 24. The device of claim 23, wherein the state machine comprises a state machine lattice comprising a plurality of data analysis elements and each data analysis element comprises a plurality of memory cells configured to analyze at least a portion of a data stream and to output a result of the analysis.
 25. The device of claim 23, wherein the state machine comprises a data buffer system having the storage buffer.
 26. The device of claim 23, wherein the state machine comprises a process buffer having the storage buffer.
 27. The device of claim 23, wherein the state machine comprises a state vector buffer having the storage buffer.
 28. The device of claim 23, wherein the state machine comprises a program buffer system having the storage buffer.
 29. The device of claim 23, wherein the state machine comprises a repair buffer system having the storage buffer.
 30. The device of claim 23, wherein the state machine comprises a match buffer having the storage buffer. 